Feature Engineering Step by Step: ML Data Preparation

Name: Feature Engineering Step by Step: ML Data Preparation
Rating: 4.3 (10 reviews)

Missing Data, Scaling, Feature Extraction, Selection, Advanced Techniques & Automated Feature Engineering

Created byDr. Amar Massoud

Last updated 3/2026

English

What you'll learn

Understand and apply feature engineering techniques to improve model accuracy.
Implement automated feature engineering using libraries like FeatureTools.
Identify and mitigate bias, ensuring fair and ethical feature selection.
Track and document feature versions for reproducibility and collaboration.

Course content

9 sections • 42 lectures • 1h 55m total length

Introduction6:05
Master feature engineering for machine learning with a practical, hands-on approach, applying techniques from basic feature creation to PCA, imputation, encoding, regularization, and cross-validation using Python code.
Our Use Case - Housing Price Prediction Dataset3:00
Apply feature engineering to a housing price dataset by median imputation for missing values, one-hot encoding neighborhoods, creating house age, log-transforming square footage, correlation analysis, and standardizing data.
Types of Features and Domain Knowledge3:23
Identify and transform feature types—categorical, numerical, temporal, and text—to improve machine learning models. Leverage domain knowledge to guide feature creation, selection, and interaction terms for better performance and interpretability.

Feature Transformation1:57
Explore feature transformation techniques for machine learning, including scaling, normalization, log and power transformations, binning, and one hot encoding for categorical data.
Scaling and Normalization5:39
Learn to scale and normalize features using standardization to zero mean and unit variance or min-max normalization to 0–1, improving model training on housing data.
Standardization of Data Using Python1:06
Normalization of Data Using Python0:59
Log Transformation, Power Transformation5:16
Binning Numerical Variables3:36
One-Hot Encoding vs. Label Encoding4:07
Compare one-hot encoding and label encoding for transforming categorical data into numerical features, weighing dimensionality against potential ordinal assumptions in different datasets.
One-Hot Encoding & Label Encoding of Data Using Python1:02

Feature Extraction and Creation1:38
Learn feature extraction and creation techniques to enhance machine learning models. Apply polynomial and interaction features, and extract time-based and text features using word counts, TF-IDF, and IDF.
Polynomial Features3:33
Create polynomial features by raising existing features and interacting them to reveal non-linear relationships that linear models miss. Use ridge or lasso regularization to balance complexity and reduce overfitting.
Polynomial Features Using Python1:09
Interaction Features3:31
Combine two or more original features to create interaction features that reveal how variables influence each other, helping models capture non-linear relationships and improve predictions without excessive dimensionality.
Interaction Features Using Python1:08
Extracting Features from Dates and Text3:41
Extract features from dates and text to convert patterns in data into numerical inputs. Use year, month, day, day of week, word counts, and TF-IDF to improve model predictions.

Feature Selection Techniques5:44
Identify the most informative features to boost model performance and reduce dimensionality using correlation matrices, Anova, chi square, regularization, and tree-based methods.
Statistical Methods6:36
Explore feature engineering with statistical methods like correlation matrices, anova, and chi-square tests to select features that most relate to the target variable, as shown with housing data.
Correlation Matrix In Python0:43
ANOVA (Analysis of Variance) in Python1:04
Chi-Square Test in Python1:14
Regularization techniques3:02
Explore how regularization techniques like Lasso, Ridge, and Elastic Net prevent overfitting by penalizing regression coefficients, enable feature selection, and improve generalization in high-dimensional data with multicollinearity.
Lasso Technique in Python1:08
Ridge Technique in Python0:47
ElasticNet Technique in Python0:51
Tree-based Methods3:22
Explore tree-based methods like random forest and xgboost for classification and regression, and learn how feature importance scores—using features such as bedrooms, bathrooms, and square footage—guide selection.
Random Forest in Python0:40
XGBoost in Python0:56

Advanced Feature Engineering Techniques7:48
Discover advanced feature engineering techniques, including target encoding, feature hashing, and principal component analysis, plus time series features like lag and rolling statistics for housing price prediction.
Target Encoding in Python1:00
Python Program for Feature Hashing0:44
Python Program for PCA0:57
Feature Engineering in Time Series: Lag features, Rolling Statistics in Python0:53

Requirements

Basic understanding of Python and machine learning concepts.
Familiarity with data analysis libraries like Pandas and Scikit-learn.
No advanced experience in feature engineering required—you’ll learn it here.

Description

Unlock the full potential of your machine learning models with our comprehensive course on Feature Engineering. Designed for data science enthusiasts, machine learning practitioners, and developers, this course covers essential and advanced feature engineering techniques that will elevate your model’s performance, accuracy, and interpretability.

From handling missing data and transforming features to automated feature engineering with libraries like FeatureTools, you'll learn the skills to create powerful, relevant features. Discover key techniques like scaling, normalization, one-hot encoding, and feature extraction. Understand when to apply polynomial and interaction features to uncover deeper patterns, and leverage time-based features for time series data. This course also introduces crucial ethical considerations, showing you how to avoid bias, ensure fairness, and enhance interpretability in your features.

Through hands-on examples, a consistent real-world use case, and Python code for each method, you’ll gain practical experience you can apply immediately. You’ll also learn best practices for documentation and version control, ensuring your features are organized and reproducible. Finally, with continuous learning and iteration techniques, you'll be equipped to keep your models relevant and effective as data evolves.

Whether you’re looking to refine your feature engineering skills or automate your workflow, this course provides the knowledge and tools to build high-performing, ethical models. Enroll today and take a step toward mastering feature engineering in machine learning!

Who this course is for:

Data science enthusiasts looking to deepen their skills in feature engineering.
Beginner and intermediate data scientists aiming to improve model performance.
Machine learning practitioners who want practical, hands-on experience.
Developers interested in ethical AI and responsible data practices.

Feature Engineering Step by Step: ML Data Preparation

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 12min

Handling Missing Data3 lectures • 12min

Scaling and Normalization of Data8 lectures • 24min

Feature Extraction and Creation6 lectures • 15min

Feature Selection Techniques12 lectures • 26min

Advanced Feature Engineering Techniques5 lectures • 11min

Automated Feature Engineering2 lectures • 5min

Best Practices and Tips1 lecture • 9min

Conclusion2 lectures • 2min

Requirements

Description

Who this course is for: